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Abstract. Development of scientific software involves tradeoffs between ease of use, generality, and perfor- 
mance. We describe the design of a general hyperbolic PDE solver that can be operated with the convenience of 
MATLAB yet achieves efficiency near that of hand-coded Fortran and scales to the largest supercomputers. This 
is achieved by using Python for most of the code while employing automatically-wrapped Fortran kernels for 
computationally intensive routines, and using Python bindings to interface with a parallel computing library 
and other numerical packages. The software described here is PyClaw, a Python-based structured grid solver 
for general systems of hyperbolic PDEs ] 18]. PyClaw provides a powerful and intuitive interface to the algo- 
rithms of the existing Fortran codes Clawpack and SharpClaw, simplifying code development and use while 
providing massive parallelism and scalable solvers via the PETSc library. The package is further augmented by 
use of PyWENO for generation of efficient high-order weighted essentially non-oscillatory reconstruction code. 
The simplicity, capability, and performance of this approach are demonstrated through application to example 
problems in shallow water flow, compressible flow and elasticity. 

1. Introduction. Traditionally, scientific codes have been developed in compiled lan- 
guages like Fortran or C. There exists an abundance of well-tested, efficient, often serial 
implementations of numerical algorithms in those languages. It is often desirable to paral- 
lelize and extended such codes with new algorithmic components in order to apply them 
to increasingly challenging problems. 

More recently, high-level scientific programming languages, such as MATLAB, IDL, 
and R, have also become an important platform for numerical codes. These languages 
offer powerful advantages: they allow code to be written in a language more familiar 
to scientists and they permit development to occur in an evolutionary fashion. Problem 
parameters can be specified and plotting can be performed interactively, bypassing the 
comparatively slow edit /compile /run/ plot cycle of development in Fortran or C (231 . 
However, programs written in such high-level languages are not portable to high perfor- 
mance computing platforms and may be very slow compared to compiled code. 

We present one approach to leveraging the advantages of both kinds of languages. 
Our starting point is Clawpack ] 16 ] : a widely used, state-of-the-art package for solving 
hyperbolic systems of partial differential equations, such as those arising in fluid me- 
chanics, astrophysics, geodynamics, magnetohydrodynamics, oceanography, porous me- 
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dia flow, and numerous other fields. In this paper we present PyClaw, a package that 
brings greater accessibility, flexibility, and parallelization to Clawpack and related pack- 
ages. PyClaw is used as an illustration to describe, demonstrate, and provide support for 
a particular approach to construction of scientific software. The approach we advocate 
consists of three steps: 

(i) use Python to create a convenient interface to serial legacy code; 

(ii) use Python to interface the resulting code with software tools for parallelization 
of the code, with minimal modification of the serial legacy code; 

(iii) use Python to interface the resulting parallel code to other packages that provide 
extended functionality. 

In the case of PyClaw, (i) consists of a Python interface to the Clawpack and SharpClaw 
Fortran-based packages for numerical solution of systems of hyperbolic PDEs. This inter- 
face allows the code to be operated in the same convenient and interactive way that one 
works with MATLAB. In step (ii), PyClaw was parallelized by interfacing with PETSc, a 
state-of-the-art library for parallel scientific computing. This enables parallel computation 
that scales to large supercomputers and achieves on-core performance close to that of the 
legacy code. Finally, step (iii) is illustrated in that PyClaw was interfaced with PyWENO, 
to increase the available order of accuracy of numerical approximation. A key in all three 
steps is the use of the numerical Python package numpy [22 1. The idea of using a layer of 
numpy-based Python code on top of Fortran, C, or C++ kernels to solve PDEs has become 
increasingly popular over the past several years; see for instance Ifl9l |6j. We consider 
PyClaw to be a very convincing case study. 

PyClaw is one of the most highly scalable Python-based codes available. Perhaps 
the first well-known scientific project to provide a parallel solver in Python is GPAW, 
which extends the Python interpreter itself with parallel data structures and algorithms 
for electronic structure calculations [20|. Python has also previously been used as a tool 
for parallelizing Fortran and C codes by introducing parallel code in a Python layer that 
also calls the Fortran/C kernels [21]. In the FiPy package, parallelism is achieved by us- 
ing an existing parallel library (Trilinos) through its Python bindings |6|. PyClaw takes 
an approach similar to that of FiPy, in which all parallel operations over distributed- 
memory processes are handled by the PETSc library through the petsc4py Python package 
http://code.google.eom/p/petsc4py/ l. This approach offers the advantages of utiliz- 
ing a documented, well-designed abstract parallel interface for developers that is already 
known to achieve excellent scaling on many architectures. 

The algorithms of Clawpack and its high-order extension, SharpClaw, are described 
in Section [2] The PyClaw framework is described in Section [3] Section |4] describes the 
parallelization of PyClaw. As demonstrated in Section|5j PyClaw maintains both the serial 
performance of Clawpack and the parallel scalability of PETSc. In Section [6j we briefly 
highlight some of the software development practices that have contributed to the success 
of PyClaw. The combination of wave propagation algorithms and scalable parallelization 
enables efficient solution of interesting scientific problems, as demonstrated through three 
examples in Section [7] 

A repository containing the data, software, and hardware environments for repro- 
ducing all experimental results and figures in this paper is available online ( |http : //] 



Listing 2: Solution of a 2D Euler Riemann problem 
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from clawpack import pyclaw 
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from clawpack import riemann 
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# Define 


the equations and the solution algorithm 




/I 


solver = 


pyclaw . ClawSolver2D (riemann . rp2_euler_4wave) 




r 
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# Define 


the problem domain 
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domain = 


pyclaw.Domain([0. ,0.] , [1 . 


,1.] , [100,100]) 




rr 

/ 


solver . all_bcs = pyclaw . BC . extrap 






o 




solution 


= pyclaw . Solut ion ( solver . 


num_eqn , domain) 
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# Set ph 


ysical parameters 






10 


gamma = 


1.4 






11 


solution 


. problem_data [' gamma ' ] = 


gamma 




12 


# Set initial data 






13 


xx, yy = 


domain . grid . p_centers 






14 


1 = xx <0 


.5; r = xx>=0.5; b = yy<0. 


5; t = yy>=0.5 




15 


solution 


.q[0,...] = 2.*l*t + l.*l*b + l.*r*t + 3.*r*b 




16 


solution 


. q [1 , . . . ] = . 75*t - . 75*b 




17 


solution 


. q [2 , . . . ] = 0.5*1 - . 5*r 






18 


solution 


. q [3 , . . . ] = . 5* solution . q 


[0, . . .]* \ 




19 




( solution . q [1 , 


. . .]**2+solution.q[2, . 


. . ] **2) \ 
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+ 1 . / ( gamma - 1 . 


) 




21 


# Solve 








22 


solver . evolve_to_time ( sol ut ion , tend=0 . 3) 




23 


# Plot 








24 


pyclaw . p 


lot . interact ive_plot () 







|bitbucket .or g/ahmadia/pycla w-sisc-rr). The most recent release of the PyClaw code 
is hosted at http: //github. com/clawpack/pyclaw and can be installed alongside its de- 
pendencies in the clawpack distribution in a few minutes with the following pip com- 
mands (pip is a freely available Python package installer and manager): 



Listing 1 : Installing the most recent release of PyClaw and its dependencies 



pip install numpy 
pip install clawpack 



The petsc4py package is not an explicit dependency of PyClaw, but if it has been 
installed, the PetClaw parallel extension described in Section [4] is seamlessly enabled. 

It is our hope that readers will download and try the code. To whet the reader's 
appetite, an example of a complete PyClaw program is shown in Listing [2] This example 
sets up, runs, and plots the solution of a two-dimensional inviscid fluid dynamics problem 
(specifically, test case 6 of HT3| ). 
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2. Finite Volume Hyperbolic PDE solvers. The numerical methods in PyClaw com- 
pute approximate solutions of systems of hyperbolic conservation laws: 

x(x)q f + V-f(q,x) x = s(q,x). (2.1) 

Here q(x, f) is a vector of conserved quantities (e.g., density, momentum, energy) and 
f(q, x) represents the flux (modeling wave-like phenomena), while s(q, x) represents ad- 
ditional non-hyperbolic source terms, such as diffusion or chemical reactions. The capacity 
function k(x) is frequently useful for taking into account variations in material properties 
or in the use of non-uniform grids (see [15 Chapter 6]). Here we describe high-resolution 
shock capturing methods, which is one of the most successful classes of numerical meth- 
ods for solving Q2.1) . 

Computing solutions to nonlinear hyperbolic equations is often costly. Solutions of 
\2.1\ generically develop singularities (shocks) in finite time, even if the initial data are 
smooth. Accurate modeling of solutions with shocks or strong convective character re- 
quires computationally expensive techniques, such as Riemann solvers and nonlinear lim- 
iters. 

In a finite volume method, the unknowns at time level t n are taken to be the averages 
of q over each cell: 

Q? = ^£' + ^(x,t")dx, (2.2) 

x i-\ 

where Ax = x ;+ i — x ; _i and i are the local grid spacing and the cell's index, respec- 
tively. A simple update of the cell averages based on the resulting waves gives the classic 
Godunov method, a robust but only first-order accurate numerical scheme: 




where F and At are the numerical flux function and the time step. Godunov's method 
results from taking a particular choice of F referred to as the upwind flux. 

The first-order method just described is very dissipative. Higher-order extensions 
require the computation of higher derivatives of the solution or flux. Near a solution 
discontinuity, or shock, spurious oscillations tend to arise due to dispersion and numerical 
differencing across the discontinuity. In order to combat this, shock-capturing methods 
use special nonlinear algorithms to compute numerical derivatives in a non-oscillatory 
way by limiting the the value of the computed derivative in the vicinity of a discontinuity 

ma. 

The classic Clawpack algorithm is based on the second-order Lax-Wendroff difference 
scheme that was later extended by LeVeque IfMl [15] . This scheme can be written in the 
flux-differencing form | |2.3) by an appropriate choice of numerical flux, which has the 
form 



— ^upwind "H ^correction/ (2-4) 



where F upw int j is the Godunov flux. The classic Clawpack algorithm is based on mod- 
ifying \2.ty by applying a limiter to ^correction- However, in LeVeque's extension, rather 
than applying the limiter to the flux variables, the limiter is applied directly to the waves 
computed by the Riemann solver. This allows for better accuracy by limiting only the 
characteristic families that are discontinuous in a given neighborhood. Furthermore, the 
first-order contribution is written in terms of fluctuations (which approximate the quasi- 
linear term Aq x ) rather than fluxes (which approximate f(cj)). This allows the algorithm 
to be applied to hyperbolic systems not in conservation form. 

While the Lax-Wendroff approach can be extended to even higher order, this is cum- 
bersome because of the large number of high order terms appearing in the Taylor series. A 
simpler alternative is the method of lines, in which the spatial derivatives are discretized 
first, leading to a system of ODEs that can be solved by traditional methods. This is the 
approach taken in SharpClaw [10, 9 11 1. First, a non-oscillatory approximation of the so- 
lution is reconstructed from the cell averages to give high order accurate point values just 
to the left and right of each cell interface. This reconstruction is performed using weighted 
essentially non-oscillatory (WENO) reconstruction in order to avoid spurious oscillations 
near discontinuities. As in the classic Clawpack algorithm, the scheme is written in terms 
of fluctuations rather than fluxes, so that it can be applied to non-conservative problems. 

2.1. Clawpack and SharpClaw. We now describe the "legacy" Fortran codes on which 
PyClaw is built. 

The classic algorithm implemented in Clawpack ("Conservation Laws Package") in- 
cludes extensions to two and three dimensions, adaptive mesh refinement, and other en- 
hancements ll6l . The unsplit multidimensional Clawpack algorithms include additional 
correction terms, computed by a secondary "transverse" Riemann solver, which approx- 
imates corner transport terms. The Clawpack software (http://www.clawpack.org) and 
its extensions, consisting of open source Fortran code, have been freely available since 
1994. More than 7,000 users have registered to download Clawpack. 

Clawpack is a very general tool in the sense that it is easily adapted to solve any 
hyperbolic system of conservation laws. The only specialized code required in order to 
solve a particular hyperbolic system is the Riemann solver routine. A wide range of 
Riemann solvers, including several for the most widely studied hyperbolic systems, have 
been developed by Clawpack users and are also freely available. Clawpack handles not 
only simple Cartesian grids but any logically quadrilateral grid, provided that there is a 
map from a uniform Cartesian domain to the desired physical domain (see Section |7?l) . 
Non-hyperbolic source terms (s(q, x)) can be easily included via operator splitting. For 
more examples and details regarding Clawpack, see [14] and [15, Chapter 23]. 

The high-order WENO-based wave propagation method is implemented in Sharp- 
Claw, another Fortran package designed similarly to Clawpack and which makes use 
of the same Riemann solver routines. The default options in SharpClaw employ fifth- 
order WENO reconstruction in space and the fourth-order strong stability preserving 
(SSP) Runge-Kutta method of 1 8 1 in time. In multi-dimensions SharpClaw requires prop- 
agation of waves only in the normal direction to each edge. 
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3. PyClaw. PyClaw is an object-oriented framework that incorporates the function- 
ality of Clawpack and SharpClaw. This functionality may be provided via either pure 
Python or calls made to the underlying Fortran routines included in Clawpack and Sharp- 
Claw. PyClaw is designed such that the user provides appropriate call-back functions, 
such as a Riemann solver, that describe and implement the problem in question, PyClaw 
than manages the "main" routine including appropriate time stepping and output. It also 
avoids the need to deal with strictly formatted data files and reduces the need to write 
custom Fortran routines for new problems. Instead, problems can be set up interactively 
or in a simple scripting language. PyClaw also allows for simulation and visualization 
to be done in a single, interactive environment. Users may engage with this framework 
at different levels, depending on their expertise and requirements. These interfaces are 
described in Section [37TI 

PyClaw wraps the full functionality of the "classic" ID, 2D and 3D Clawpack code 
including the use of capacity functions, mapped grids, and both dimensionally-split and 
fully-multidimensional algorithms. It also provides the full functionality of SharpClaw, 
and adds to this with higher order WENO reconstruction. It does not presently include 
adaptive mesh refinement, which is part of the AMRClaw and GeoClaw extensions of 
Clawpack. 

3.1. Interfaces. The PyClaw distribution includes pre-written application scripts that 
solve problems in acoustics, elasticity, compressible flow, shallow water flow, and other 
application domains. These application scripts represent the typical "main" routines that 
lower-level language developers are used to, and are written with ease of understanding 
as their most important goal. These scripts run both on serial workstations and from 
batch processing queues for, e.g., 8,000-node parallel jobs without modification. Novice 
users can solve a wide range of problems by modifying the example scripts to deal with 
different domains, initial conditions, boundary conditions, and so forth. This requires 
only simple understanding of high-level scripting code, but allows users to compute so- 
lutions of complex problems on large supercomputers. The scripts for the applications in 
this paper have all been added to the distributed examples, and we plan to continue this 
practice with respect to future publications that use PyClaw. 

Novice or advanced users may also run problems and analyze results in an interactive 
shell. When Python is invoked with standard input connected to a tty device, it reads and 
executes commands interactively. This feature simplifies serial development, debugging, 
and visualization, and is familiar to users of commercial software such as MATLAB and 
Mathematica. PyClaw's top level classes present the same API whether used in serial 
or parallel, allowing users to develop interactively, then run production jobs in batch 
environments. 

Advanced users may want to solve a hyperbolic system that is not included among the 
example applications. In Clawpack a number of Riemann solvers have been written for 
specific systems and are included with Clawpack. Since PyClaw and Clawpack Riemann 
solvers are interoperable, many systems of interest have already been implemented and 
can be used immediately in PyClaw. A user also has the option of writing his/her own 
Riemann solver in Fortran which can then be utilized in Clawpack as well as PyClaw. 
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Clawpack users who wish to run an existing serial Clawpack application in parallel can do 
so easily by wrapping any problem-specific Fortran routines (such as those that set initial 
conditions or compute source terms) automatically with f2py and using the resulting 
Python function handles in PyClaw. 

Numerical analysts are often interested in comparing solutions of a problem obtained 
with different methods or different values of method parameters. The sample application 
scripts provide a common functional interface that can be accessed from the command 
line for selecting between solvers, choosing serial or parallel computation, and other op- 
tions. Users are free to extend this interface to allow more programmatic flexibility at 
the command line or from within a batch environment. PyClaw also fully exposes the 
full range of command line options available from the underlying PETSc library, allow- 
ing advanced users to tweak low-level settings such as message-passing communication 
strategies. 

Frequently, research scientists are interested in comparing the performance of numer- 
ical methods. PyClaw enables this comparison by allowing scientific developers to extend 
the software with a new Solver class. The Solver class, described in the next section, is 
responsible for prescribing a single time step of the numerical algorithm. In PyClaw this 
is accomplished by implementing a homogenous_step routine which evolves the solution 
of the PDE from time f to t + At. Since most existing codes have such a routine already, it 
is often straightforward to include legacy code in the PyClaw framework by simply wrap- 
ping this function. Non-hyperbolic terms s(q, x) can also be incorporated via operator 
splitting. This is the most common way to extend the Solver class and allows for easy 
comparison between different numerical methods. 

3.2. Classes. The primary abstractions used in PyClaw are the Solver and the Solu- 
tion. The Solver class provides an abstract interface to evolve a Solution object forward in 
time. The Solution class is a data abstraction containing information about the domain of 
the PDE and the state of the solution at a particular time inside the domain. Here we will 
discuss these classes and how they interact. 



The role of the Solver is illustrated in Figure 3.1(b) The Solver class prescribes how a 
State Q" is evolved forward in time to obtain Q" +1 ; in general this consists of three parts: 

(i) set ghost cell values based on the prescribed boundary conditions; 

(ii) advance the solution based on the hyperbolic terms (i.e., qt + V ■ f(q, x) = 0); 

(iii) advance the solution based on the source term s(q, x) (if present) by a fractional- 
step approach |15, Chapter 17]. 

The base Solver class implements the basic interface to each of these functions and a 
subclass of the Solver class is expected to implement the appropriate functions depending 
on the numerical method being implemented. The Solver class is sufficiently abstract to 
accommodate algorithms that are based on the method of lines, such as SharpClaw, as 
well as algorithms that are not, such as Clawpack. 

The Solution class, depicted in Figure [3. 1(a) has two purposes: 

• Describe the problem domain 

• Keep track of the values of the state variables Q" and PDE coefficients 

Like Clawpack, PyClaw simulations are always based on a computational domain com- 
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posed of tensor products of one-dimensional equispaced discretizations of space. More 
general physical domains may be used as long as it is possible to map them to that 
computational domain. PyClaw includes a set of geometry classes that implement these 
abstractions. 

The solution values Q n and the PDE coefficients are contained in numpy arrays stored 
in the Solution object. Thus the full Solution class represents a snapshot of the gridded 
data. The class acts as a container object with one or more Grid and State objects such as 
in the case of adaptive mesh refinement or nested grids, both of which are possible with 
Clawpack algorithms, though not yet available in PyClaw. This hierarchical class structure 
allows the underlying data and algorithms to be modified without the knowledge of the 
interacting objects and without changing the interface presented to the user. An example 
of this is the PetClaw State object, which reimplements State functionality over distributed 
memory. In the future, we intend to provide State objects that implement other strategies 
for accelerating performance. 



Solution 




Grid 2 




Dimension x ^) 





(a) Structure of a PyClaw Solution object, which 
may contain multiple Grid objects, each of which 
may have multiple associated State objects. Each 
State object has a associated fields of conserved 
quantities (e.g., density, momentum) and option- 
ally, associated auxiliary property fields. 



Solution at 



Solver 



Riemann 
Solver 

Boundary 
Conditions 

Source 
Terms 

Other Solver 



Solution at 

t = tn+1 



(b) Role of Solver object. The Solver acts on a So- 
lution object in order to evolve it to a later time. 



Fig. 3.1: Structure and function of the main PyClaw classes. 

3.3. Extension using PyWENO. One of the principal achievements of PyClaw has 
been to facilitate the extension of Clawpack and SharpClaw by interfacing with other 
packages. For example, PyWENO |3| has been used to add much higher-order function- 
ality to the existing SharpClaw code, within PyClaw. 

The Fortran code SharpClaw contains only fifth-order WENO routines for evaluation 
at cell interfaces. New WENO routines for PyClaw were generated by PyWENO, which 
is a standalone package for building custom WENO codes. For a given (arbitrarily high) 
order of reconstruction, PyWENO symbolically computes the smoothness coefficients, re- 
construction coefficients, and optimal (linear) weights of the WENO method. From these 
symbolic results, PyWENO generates Fortran kernels that perform the WENO reconstruc- 
tions (it can also generate C and OpenCL kernels). The generated kernels can optionally 
perform individual steps of the reconstruction process (i.e., computing smoothness in- 
dicators, nonlinear weights, or the final reconstruction) or combinations thereof. This 
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affords authors some flexibility in avoiding redundant computations or minimizing mem- 
ory allocations and accesses. Furthermore, the points within each cell at which the WENO 
reconstruction is performed are also arbitrary. Negative weights are automatically split 
into positive and negative parts |26|, allowing PyWENO to generate kernels for routines 
for arbitrary sets of points (such as arbitrary order Gauss-Legendre quadrature points). 

For PyClaw, odd-order WENO routines to approximate the solution at the left and 
right edges of each cell were generated from fifth to seventeenth order. All aspects of 
the WENO reconstruction are wrapped into standalone subroutines and no temporary 
work arrays are allocated. Using these routines is trivially easy; for instance, to use the 
9th-order WENO method instead of the classic algorithm in Listing [2] one needs only to 
replace line 9 by the two lines 



Listing 3: Using SharpClaw (with PyWENO) 



solver = pyclaw . SharpClawSolver ID () 
solver . weno_order = 9 



4. Parallelization. Like many finite difference and finite volume codes, Clawpack 
and SharpClaw implement boundary conditions through the use of ghost cells. In this 
approach, fictitious layers of cells are added around the edge of the problem domain; 
the number of layers depends on the width of the stencil of the numerical scheme. At 
the beginning of each step, the ghost cell values are set to satisfy the specified boundary 
conditions. Then the numerical scheme is applied on all the interior (non-ghost) cells. 
Many types of boundary conditions can be handled well in this manner, including peri- 
odicity, reflection, and outflow (non-reflecting). Custom boundary conditions may also be 
specified, for instance to model time-dependent inflow. 

This approach is highly amenable to parallelization, since it is based on the idea that 
information at the edge of a domain is filled in by a routine that is independent of the 
rest of the numerical scheme. Therefore, the serial kernels can be applied on each proces- 
sor of a distributed parallel machine as long as some routine first fills the ghost cells on 
the processor either by appeal to boundary conditions or through communication with 
neighboring processors, as appropriate. Only this ghost cell routine needs to know the 
global topology of the problem; the serial kernels operate based entirely on local infor- 
mation. This orthogonality allows independent development of serial numerical schemes 
and parallel communication patterns, and is a key strategy in combining the work of 
computational mathematicians and computer scientists. 

The same global-local decomposition is employed in PETSc. The PETSc library in- 
cludes a DMDA object that implements parallelization through the use of ghost cells. The 
DMDA is a highly scalable class for data layout across parallel, structured grids. All stor- 
age allocation and communication of ghost values is handled by the DMDA, but storage is 
returned to the PyClaw program as numpy arrays so that no code changes are necessary 
and the user experience is identical. In Figure 4.1 we show three different representations 
of data over a simple 5x6 structured grid. The global ordering is used as input to PETSc 
linear and nonlinear solvers, whereas the natural ordering is used for output since it is 
independent of the particular parallel partition. Local ordering is used to extract data over 
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Fig. 4.1: A simple 5x6 structured mesh is represented using a DMDA. The leftmost 
figure shows the natural numbering of unknowns, for instance used by most visualiza- 
tion packages, which is independent of the parallel layout. The middle figure shows the 
PETSc global numbering of unknowns, which is contiguous for each process. The right- 
most figure shows the local numbering of unknowns for process 0, which includes ghost 
unknowns shared with neighboring processes. 



a "halo" region, including ghost unknowns shared with other processes. 

This is, in fact, how PyClaw makes use of the DMDA structure. Local vectors are 
extracted with a given number of overlap unknowns, and computations are performed 
using the same serial routines. These local vectors are then used to update a global vector, 
and PETSc performs the appropriate accumulation for shared unknowns. This simple 
mechanism in PETSc for integrating local and global data (which works also for unstruc- 
tured grids) allows easy parallelization. Thus PyClaw relies on Clawpack and SharpClaw 
to provide computational kernels for time-dependent nonlinear wave propagation and 
on PETSc (through petsc4py) to manage distributed data arrays and the communication 
between them. The data structures in PETSc and Clawpack/ SharpClaw are directly inter- 
faced through the Python package numpy [22]. 

The parallel extension of PyClaw consists of only about 300 lines of Python code. Any 
PyClaw script can be run in parallel simply by replacing the statement from clawpack 
import pyclaw with 



Listing 4: Running in parallel 



1 from clawpack import petclaw as pyclaw 



and invoking the Python script with mpirun. 

The serial PyClaw routines handle discretization, Riemann solves, limiting and re- 
construction, since they only depend on local data. PETSc handles parallel layout and 
communication, but has no information about the local computations. PETSc allows fine- 
grained control of the ghost value communication patterns so that parallel performance 
can be tuned to different supercomputing architectures, but by default a user does not 
need to manage parallelism or see PETSc code. In fact, the PetClaw user is shielded from 
PETSc in much the same way that a PETSc user is shielded from MPI. This separation can 
enable future development. For instance, an unstructured mesh topology of hexahedral 
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elements could be managed by PETSc, using a Riemann solver which could accommodate 
deformed elements, without changes to PyClaw. 

In addition to communication of ghost cell values, parallel hyperbolic solvers require 
communication of the maximum wave speed occurring on each processor in order to 
check whether a prescribed stability condition (generally phrased in terms of the Courant 
number) has been satisfied and choose the size of the next time step appropriately This 
is also handled by a single PETSc call. 

Although the basic Clawpack and SharpClaw algorithms are explicit and require no 
algebraic solver, a powerful advantage gained by using PETSc for parallelization is the 
possibility of employing PETSc's solvers for implicit integration of hyperbolic problems 
that are stiff due to source terms or fast waves that do not need to be accurately resolved. 
This is the subject of ongoing work. 

A particular challenge of using Python is that most parallel debuggers support only 
C or Fortran, making parallel debugging of Python codes difficult |4|. This is yet another 
motivation for using a tested parallel library like PETSc. 

5. Performance. A few previous works have considered efficiency of scientific Python 
codes in serial as well as in parallel; see for instance |1]|12H2T1|4|. Those studies consisted 
mainly of simple code snippets run in serial or on up to a few dozen processors until the 
recent work (4|, which includes scalability studies up to 16,384 cores. In this section, we 
investigate the efficiency of a full object-oriented Python framework (PyClaw) compared 
with hand-coded Fortran (Clawpack). We also consider the scaling of PetClaw on all 
65,536 cores of the Shaheen supercomputer at KAUST. 

We consider only the second-order classic Clawpack algorithm here, as we are mainly 
interested in the effect of using a Python framework (in the serial case) and the cost of 
communication (in the parallel case). In terms of these factors, roughly similar results 
may be expected for the performance of the higher order algorithms, and preliminary 
tests (not described here) indicate good scaling of those also. 

5.1. Serial performance. For a detailed serial performance comparison of an explicit 
stencil-based PDE code in Python, see [12] . In that work, vectorized numpy code was 
found to be fast enough for some operations, while wrapped Fortran loops performed 
identically to a pure Fortran code. In contrast to the simple kernel code considered there, 
we present tests of a full object-oriented solver framework. Our results thus extend those 
of 1 12 1, providing an indication of the efficiency that can be expected for a sophisticated 
Python-based PDE solver framework. 



Table 5.1 shows an on-core serial comparison between the Fortran-only Clawpack 
code and the corresponding hybrid PyClaw implementation for two systems of equations 
on two different platforms. The hyperbolic systems considered are the 2D linear acoustics 
equation and the 2D shallow water (SW) equations ITT51 . The acoustics test involves a very 
simple Riemann solver (amounting to a 3 x 3 matrix-vector multiply) and is intended to 
provide an upper bound on the performance loss arising from the Python code overhead. 
The shallow water test involves a more typical, costly Riemann solver (specifically, a Roe 
solver with an entropy fix) and should be considered as more representative of realistic 
nonlinear application problems. Clawpack and PyClaw rely on similar Fortran kernels 
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that differ only in the array layout. Because most of the computational cost is in executing 
the low-level Fortran kernels, the difference in performance is relatively small - though 
not negligible. The results for the Shallow water equations are in rough agreement with 
the 10% overhead reported in J4). A 10-30% increase in computational time (for realistic 
applications) seems well worth the advantages provided by the use of Python (in partic- 
ular, easy parallelism). The overhead is expected to be even smaller for more complex 
systems or in three dimensions. 

Table 5.1: Timing results (in seconds) for on-core serial experiments solving acoustics and 
shallow water problems implemented in both Clawpack and PyClaw on Intel Xeon and 
the IBM BlueGene/P PowerPC 450 processors 



Application 


Processor 


Clawpack 


PyClaw 


Ratio 


Acoustics 


Intel Xeon 


28s 


41s 


1.5 


PowerPC 450 


192s 


316s 


1.6 


Shallow Water 


Intel Xeon 


79s 


99s 


1.3 




PowerPC 450 


714s 


800s 


1.1 



5.2. Parallel performance. We now investigate the parallel performance of PetClaw 
on the Shaheen supercomputer at KAUST, an IBM BlueGene/P system consisting of 65,536 
cores. When characterizing the performance of scientific codes on supercomputers, a 
commonly used characterization is that of weak scalability, which is assessed by studying 
how the run time of the code is affected when the resolution of the simulation and the 
number of processors is increased commensurately to maintain a fixed amount of work 
per processor. The parallel efficiency is given by dividing the run time of the single 
processor job by the run time of the parallel job. 

The problem used for the comparisons is of a compressible, inviscid flow that consists 
of a shock impacting a low-density bubble (examined in detail in Section [7!2| . We inves- 
tigate weak scaling by running the problem for a fixed number of time steps and with a 
fixed number of grid cells (400 x 400 = 160,000) per core, while increasing the number 
of cores from one up to the whole machine. Figure 5.1 shows the results, with parallel 
efficiency provided in the last row. It is important to note that the time required to load 
the necessary Python packages and shared objects, which occurs only once at the begin- 
ning of a simulation (or series of batch simulations) has been excluded from the results 
presented here. This load time is discussed in the next section. 

Observe that in all parallel runs, more than 90% of the time is spent in the computa- 
tional kernels. The parallel operations scale extremely well: the the CFL condition-related 
reduction takes essentially the same amount of time for all runs from 16 processors up, as 
does the communication of ghost cell values in localToGlobal. Together these parallel op- 
erations consume about 6% of the total run time. Parallel initialization consists of PETSc 
parallel object construction, including memory allocation and MPI communicator initial- 
ization. Note that the parallel initialization, while significant in these artificial test runs, 
will not contribute significantly to the cost of real simulations because it is a one-time cost. 
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92.6 


92.7 
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1.0 
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0.97 


0.97 


0.92 



Number of Cores 



Fig. 5.1: Weak scaling performance profile of the shock bubble problem with 160,000 grid 
cells per core 

5.3. Dynamic Loading. As alluded to already, the work of loading Python libraries 
and objects dynamically at run-time does not currently scale well on the Shaheen sys- 
tem. Large-scale supercomputers such as Shaheen rely on parallel file systems that are 
designed to support large distributed loads, with each process independently accessing 
data. Dynamic loading does not follow this pattern because every process is attempting 
to access the same data simultaneously. This issue was partially addressed in [4], but an 
implementation capable of supporting dynamic library loading is still lacking. 

The dynamic loading time for the PetClaw runs in Section[7|is less than 5% of the total 
simulation time, and this will generally be the case for 2D wave propagation problems be- 
cause the CFL condition means that large simulations of hyperbolic problems necessarily 
require long run times in order for waves to propagate across the full domain. 
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6. Software Development Practices. The majority of software development practices 
utilized in PyClaw are inherited from the open source software community. The com- 
munity's atmosphere of free and open sharing complements the tradition of scientific 
inquiry. In fact, growing movements within the scientific community seek to embrace 
scientific reproducibility for software tools used in conducting mathematical and scientific 
research E El 

In addition to making our results reproducible, we also intend that our software be 
useful as a reference for understanding numerical methods involved in solving hyper- 
bolic PDEs and as a platform for extending and applying these techniques. As such, we 
also seek to provide a friendly and inviting context for scientists working in this cross- 
disciplinary environment to conduct their research. 

6.1. Application Tests. The goal of reproducibility in research is to improve not only 
confidence in results, but their extensibility as well. A series of regression tests have been 
devised for every major application where PyClaw has been used. The script and param- 
eters for generating the test are stored in the repository (typically in a single file), along 
with verified output from a successful run. Where possible, the output is verified against 
a different solver implementation or by analysis. These "application tests" produce the 
same output regardless of the choice of solver type, kernel implementation, or computer 
architecture. The python-nose (http://code.google.eom/p/python-nose/) unit testing 
framework simplifies development and selective execution of the tests. Programmatic test 
code generation is used to exercise the full range of solver and kernel options for each 
test. Scientific results are archived as application tests within the unit testing framework, 
ensuring that our published results are reproducible in current and future versions of the 
PyClaw solver. 

In our experience, the application tests are the single greatest factor in facilitating 
adoption, refactoring, and extension of the code. New users are confident that they have 
a working installation (or can tell us what doesn't work on their architectures) and are 
capable of reproducing our published scientific results. Developers refactoring the code 
for improved organization, performance, or readability can rely on the application tests 
for regression testing, to ensure that their changes have not incidentally broken anything. 
Perhaps most importantly, new solver methods and their implementations can be verified 
against known solutions with the application tests. This facilitates and encourages the 
development of new ideas within the PyClaw framework. 

6.2. Hosted Distributed Version Control. Our use of git ( |http : //git - scm . com/) , a 
modern, distributed version control system, provides many benefits. Development need 
not be synchronized through a master server, which makes it easier to incorporate sub- 
projects from developers loosely attached to the core team. Management of releases and 
bugfix updates has been greatly simplified. However, perhaps the greatest beneficiary is 
the user. Users do not have to wait for PyClaw releases in order to retrieve bugfixes for 
particular machines or improvements which are under active development, they need only 
update to a given changeset. Moreover, a user can easily switch between an approved re- 
lease and experimental code for comparison with a single version control command. This 
allows the user a much finer-grained manipulation of versioning than was previously 
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possible. 

There are many excellent open source distributed version control hosting sites, in- 
cluding Bitbucket (http://www.bitbucket.orgl and GitHub (http://www.github.orgl, 
which provide a range a services to both developers and community users. PyClaw 
leverages the services provided at GitHub, which includes wiki webpages for user com- 
munication, as well as a bug reporting and tracking infrastructure integrated with the 
hosted version control repository. We have separately engaged the use of Google Groups 
to provide a mailing list for the PyClaw user and developer community. 

6.3. Documentation. PyClaw is provided with a range of documentation suitable for 
the variety of users interacting with the software. While this paper provides a high-level 
overview of the capabilities of the code and its application, it is our experience from 
using other projects that the best software documentation includes a separate tutorial and 
user's guide with a class reference section. The tutorial and user's guide are maintained 
in the Restructured Text format, from which they can be translated into HTML, PDF, and 
several other output formats using for instance Sphinx (http : //sphinx . pocoo . org/ ). The 
PyClaw code itself is documented inline using Python's docstring conventions, allowing 
us to automatically generate class and module reference sections for our documentation. 

7. Applications. The numerical algorithms made accessible in PyClaw, empowered 
by parallelization, are capable of modeling challenging wave propagation phenomena. In 
this section, we provide three example applications. They are not intended to break new 
scientific ground, but rather to demonstrate the versatility of the algorithms accessible 
through PyClaw, and (in the third application) the power of PetClaw as a scalable parallel 
solver. 

7.1. Shallow Water Flow on the Sphere. Classical shallow water equations on a 
sphere are an approximation of the flow on the earth's surface. They are of interest 
because they capture most of the flow's features of a thin layer of fluid tangent to the 
surface of the sphere. Therefore, they are often used by meteorologists, climatologists and 
geophysicists to model both atmosphere and oceans. 

In three-dimensional Cartesian coordinates, using h and u = (u,v,w) T to define the 
height and the fluid velocity vector, the shallow water equations are of the form < |2.1) , with 
q = (h,hu,hv,hw) T and 



/ hu hv hw \ 

hu 2 + jgh huv huw 

huv hv 2 + ^gh hvw 

\ huw hvw hw 2 + jgh J 



(7.1) 



where g is the gravitational acceleration. The source term s (q, x) includes the Coriolis 
force and an additional term that ensures that the velocity is tangent to the sphere: 

20 

s (q,x) = — — z (x x hu) + (x ■ (V ■ f)) x. (7.2) 

Here O and a are the rotation rate and the radius of the earth, respectively. In \7.2\ f is 
the part of the flux matrix associated with the momentum equations Q. 
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In this framework, we consider the numerical solution of the zonal wave number 
4 Rossby-Haurwitz problem ||30ll . Rossby-Haurwitz waves are steadily propagating ini- 
tial conditions of the nonlinear non-divergent barotropic vorticity equation on a rotating 
sphere [7J. Although they do not represent exact steady solutions of the shallow water 
equations, they are expected to evolve in a similar way J29J- For this reason Rossby- 
Haurwitz waves have also been used to test shallow water numerical models and are 
among the standard shallow water model test cases proposed by Williamson et al. [30 j. 

The problem consists of a divergence-free initial velocity field that rotates about the 
z— axis without significantly changing form on short time scales (dynamical weak insta- 
bilities). On longer time scales (50-100 days) the instabilities effects lead to the breakdown 
of the wave structure. Previous studies have shown that the time at which the solution 
symmetry breaks depends strongly on the numerical truncation error of the scheme 1 29 1 . 
Because of this the Rossby-Haurwitz problem is frequently used to assess the accuracy 
of a numerical algorithm for the solution of the shallow water equations on a rotating 
sphere. 

The three-dimensional shallow water equations are solved on the logically rectangu- 
lar grid introduced in [2], using the same approach employed there, namely the classic 
Clawpack method with Strang splitting for the source term |28|. Simulations have been 
performed on four grids with 100 x 50, 200 x 100, 400 x 200 and 800 x 400 cells. Table 



7.1 lists the breakdown time. With the coarsest mesh, the diffusive part of the numerical 
error suppresses the instability completely (cfr. [29]). The finer grid results confirm that 
the time at which the initial velocity loses its symmetry is sensitive to the numerical error. 

Table 7.1: Time at which the symmetry of the Rossby-Haurwitz wave breaks down. 



Grid 


Breakdown 


100 x 50 




200 x 100 


« 34 d 


400 x 200 


w 45 d 


800 x 400 


w46d 



Figure 7.1 shows the contour line of the water height at day (initial solution), 38, 45 
and 48, for the grid with 400 x 200 cells. These plots show qualitatively the evolution of 
the instability. 

7.2. Shock-Bubble Interaction. The Euler equations for a compressible, inviscid fluid 
with cylindrical symmetry can be written as 



Pt + {pu) z + (pv) r 



pv 
r 



/ \ , ? \ / \ P uv 

{pu)t + {pU A + p) z + (puv) r = 



(pa)t + (puv) z + (pv 2 + p) r = pl 



r 

2 (7.3) 



(pE) t + ((pE + p)u) z + ((pE + p)v) r 
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r 

(pE + P)v 
r 




(c) 45 days. (d) 48 days. 

Fig. 7.1: Water height of the zonal wave number 4 Rossby-Haurwitz problem at different 
times; grid with 400 x 200 cells. Contour interval 120 m. 

Here the z-coordinate represents distance along the axis of symmetry while the r-coordinate| 
measures distance away from the axis of symmetry. The quantities p, E, p represent den- 
sity total energy and pressure, respectively while u and v are the z- and r-components of 
velocity. 

We consider an ideal gas with 7 = 1.4 in the cylindrical domain [0,2] x [0,0.5]. The 
problem consists of a planar shock traveling in the z-direction that impacts a spherical 
bubble of lower-density fluid. In front of the shock u = v = and p = p = 1 except inside 
the bubble, where p = l,p = 0.1. Behind the shock, p = 5,p ~ 2.82, v « 1.61, and these 
conditions are also imposed at the left edge of the domain. In addition to \7.3\ , we solve 
a simple advection equation for a tracer that is initially set to unity in the bubble and 
zero elsewhere in order to visualize where the fluid interior to the bubble is transported. 
Reflecting boundary conditions are imposed at the bottom of the domain while outflow 
conditions are imposed at the top and right boundaries. 
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Since the edge of the bubble is curved, it does not align with the Cartesian grid. 
Thus, in the cells that are partly inside and partly outside the bubble, the initial condi- 
tion used is a weighted average of the different solution values, based on the fraction 
of the cell that is inside. This fraction is computed by adaptive quadrature using the 
scipy . integrate . quad package. 



Figure 7.2 shows the initial condition and results of this problem, using a 1280 x 320 
grid and the unsplit classic Clawpack algorithm with full transverse corrections. The bub- 
ble is observed to transform into a "smoke ring". Considerably more detailed structure 
is evident in this simulation compared to lower-resolution adaptively refined results from 
AMRClaw that are published at http : //depts . Washington . edu/clawpack/clawpack-4 . 



3/applications/euler/2d/shockbubble/amr/www/index . html Figure 7.3 shows a close- 



up of the smoke ring solution obtained with the classic Clawpack algorithm, as well as 
solutions obtained using the SharpClaw algorithm with fifth- (WEN05) and seventh-order 
(WEN07) reconstructions. All runs were performed on a 1280 x 320 grid with a maxi- 
mum CFL number of 0.8. Although the overall features of the solutions are similar, more 
fine structure is apparent in the SharpClaw results. For instance, several vortices can be 
seen to the left of the smoke ring in the WEN07 run that are not resolved in the classic 
run. 

7.3. Cylindrical Solitary Waves in a Periodic Medium. The problem considered in 
this section is taken from |24J. It involves the propagation of nonlinear waves in a two- 
dimensional crystal, leading to the formation of solitary radial waves or "rings". For this 
purpose, we consider the 2D p-system with spatially-varying coefficients as a model for 
elastic waves: 

e t - u x - v v = 0, 
p{x,y)u t - cr(e,x,y) x = 0, (7.4) 
p{x,y)v t - a{e,x,y) y = 0. 

Here e represents the strain, u and v are the velocities in x and y respectively, p(x, y) is the 
spatially-varying material density and cr(e, x, y) is the stress. The system ( |7.4) is closed by 
introducing a nonlinear constitutive relation die, x,y). Similar to |17J, we take 

cr(e,x,y) = exp(K(x,y)e) + 1, (7.5) 

where K(x, y) is the spatially-varying bulk modulus. 



The medium, a small part of which is shown in Figure 7.4(a) is in a checkerboard 
pattern with alternating squares of two different materials: 



(K(x,y),p{x,y)) 



The problem is quite challenging, for multiple reasons. First, the flux is spatially varying 
and even discontinuous - meaning that the solution variables (strain and momentum) 
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(a) Initial tracer showing location of low-density bubble. 



Tracer at time t = 0.67500000 




(b) Tracer showing location of of bubble material after shock impact. 



Density at time t = 0.67500000 




z 



(c) Schlieren plot of density. 

Fig. 7.2: Results of shock-bubble interaction computation, showing the transformation of 
the initially spherical bubble into a "smoke ring" after it is impacted by a shock wave. 



are also discontinuous. Furthermore, these discontinuities include corner singularities. 
Finally, and owing in part to these discontinuities, it is necessary to use a large number of 
cells per material period (> 100) in order to get even qualitatively accurate results. As we 
are interested in a phenomenon that arises only after waves pass through many (> 100) 
material periods, this leads to a very high computational cost. 

The problem is solved using the SharpClaw algorithm with fifth-order WENO re- 
construction. As explained in Section [2T] the implementation in SharpClaw is based on 
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(a) Classic Clawpack solution. 
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(b) SharpClaw solution using WEN05 reconstruc- (c) SharpClaw solution using WEN07 reconstruc- 
tion, tion. 



Fig. 7.3: Schlieren plots of density, zoomed in on the smoke ring. All solutions are com- 
puted on a 1280 x 320 grid. 



solving normal Riemann problems at the grid interfaces; see |25J for a detailed explana- 
tion of the approximate Riemann solvers employed and for a much more detailed study 
of this problem. 

A close-up view of the initial stress is shown in Figure 7.4(b) The stress is a Gaussian 
pulse with an amplitude of 5 and a variance in x and y of 5, centered at the origin. The 
velocity is initially zero. The problem is symmetric with respect to reflection about the 
x- and y- axes, so the computational domain is restricted to the positive quadrant and 
reflecting boundary conditions are imposed at the left and bottom boundaries. Outflow 
(zero-order extrapolation) conditions are imposed at the top and right boundaries. In 
units of the medium periodi, the domain considered is 200 x 200, and the grid spacing is 
Ax = Ay = 1 /240. Hence the full simulation involves 6.8 x 10 9 unknowns. It was run on 
16,384 cores of the Shaheen supercomputer at KAUST, over a period of about 3.2 days. 

The formation of solitary wave rings is seen clearly in Figure 7.5(a) which depicts the 
stress at t — 200. The structure of these waves is shown in Figure 7.5(b) which displays 
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(a) Detail of checkerboard medium. 
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(b) Detail of initial stress. 



Fig. 7.4: The medium and initial condition (both shown zoomed-in) for the cylindrical 
solitary wave problem. 
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(a) Solitary wave train (stress shown). 



(b) One-dimensional slices of the stress, 
black line: 45°; dashed red line: 0°. 
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Fig. 7.5: Solution of the p-system 
solitary wave formation. 

slices of the stress at 45° (solid line) and 0° (dashed line) with respect to the x-axis. 

8. Discussion and future plans. This work demonstrates that the use of Python in 
combination with existing Fortran and C codes allows the production of scientific software 
that is accessible, extensible, and efficient. The serial performance loss is relatively small, 
and is more than compensated for even on a typical laptop by the ability to run in parallel 
without any additional effort. Combining scalable parallelism with the algorithms of 
Clawpack and SharpClaw yields a potent tool for exploring novel wave phenomena. 

We are in the process of extending PyClaw to include three-dimensional wave prop- 
agation and implicit time stepping. Both of these are straightforward steps, since the Py- 
Claw framework was written with such developments in mind and the related software 
packages (Clawpack and PETSc) already support these features. Preliminary implemen- 
tations are under testing. 

21 



7.4 1 in a checkerboard medium, showing 2D cylindrical 



The use of Python in scientific computing has many potential benefits |23|. The 
Python packages numpy scipy and matplotlib offer essential numerical tools with inter- 
faces familiar to users of MATLAB (the lingua franca of numerical methods) in a general- 
purpose programming language. An increasing number of important libraries (like PETSc 
and Trilinos) now have Python bindings, making it relatively easy to add powerful capa- 
bilities like massively scalable parallelism to Python codes. As discussed in Section |6j the 
Python community promotes a range of positive code development practices that are not 
common in scientific teams but are often picked up by those who begin to work in Python 

While the existence of multiple scientific codes for solving the same problems is 
healthy it is widely recognized that increased code sharing and reuse would benefit the 
numerical analysis and scientific computing communities. Closer integration of code de- 
veloped by different groups would not only allow researchers to be more productive (by 
reducing duplication of effort), but would also allow useful algorithmic improvements 
to be more rapidly distinguished from insignificant ones by simplifying the task of com- 
paring them. In our experience, the adoption of Python as a high-level scientific coding 
language dramatically increases opportunities for code-sharing and reuse. Indeed, the 
results described in this paper consist largely of combining a few powerful existing pieces 
of scientific software. 
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